A Hybrid Algorithm for Matching Arabic Names
نویسنده
چکیده
In this paper, a new hybrid algorithm which combines both of token-based and character-based approaches is presented. The basic Levenshtein approach has been extended to token-based distance metric. The distance metric is enhanced to set the proper granularity level behavior of the algorithm. It smoothly maps a threshold of misspellings differences at the character level, and the importance of token level errors in terms of token's position and frequency. Using a large Arabic dataset, the experimental results show that the proposed algorithm overcomes successfully many types of errors such as: typographical errors, omission or insertion of middle name components, omission of non-significant popular name components, and different writing styles character variations. When compared the results with other classical algorithms, using the same dataset, the proposed algorithm was found to increase the minimum success level of best tested algorithms, while achieving higher upper limits .
منابع مشابه
Machine Transliteration of Names in Arabic Text under Consideration for Other Conferences (specify)? None Machine Transliteration of Names in Arabic Text
We present a transliteration algorithm based on sound and spelling mappings using nite state machines. The transliteration models can be trained on relatively small lists of names. We introduce a new spelling-based model that much more accurate than state-of-the-art phonetic-based models and can be trained on easier-to-obtain training data. We apply our transliteration algorithm to the translit...
متن کاملCross Linguistic Name Matching in English and Arabic
This paper presents a solution to the problem of matching personal names in English to the same names represented in Arabic script. Standard string comparison measures perform poorly on this task due to varying transliteration conventions in both languages and the fact that Arabic script does not usually represent short vowels. Significant improvement is achieved by augmenting the classic Leven...
متن کاملAn Improved Semantic Schema Matching Approach
Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...
متن کاملA Hybrid Algorithm for Matching Arabic Names
In this paper, a new hybrid algorithm which combines both of token-based and character-based approaches is presented. The basic Levenshtein approach has been extended to token-based distance metric. The distance metric is enhanced to set the proper granularity level behavior of the algorithm. It smoothly maps a threshold of misspellings differences at the character level, and the importance of ...
متن کاملRule-and Dictionary-based Solution for Variations in Written Arabic Names in Social Networks, Big Data, Accounting Systems and Large Databases
This paper investigates the problem that some Arabic names can be written in multiple ways. When someone searches for only one form of a name, neither exact nor approximate matching is appropriate for returning the multiple variants of the name. Exact matching requires the user to enter all forms of the name for the search, and approximate matching yields names not among the variations of the o...
متن کامل